Dual-thread Weld: A Technique for Latency Tolerance in Horizontal Architectures
نویسندگان
چکیده
This paper presents dual-thread Weld architecture for VLIW/EPIC processors. The dual-thread Weld model supports one main thread and one speculative thread running simultaneously in a VLIW/EPIC processor with a register file and a fetch unit per thread. This paper analyzes the cost-performance impact of the dual-thread Weld model, which includes analysis of migrating the disambiguation hardware for speculative memory operations to the compiler and of the sensitivity of the model to the variation of branch misprediction and second-level cache miss penalties. Up to 35% speedup can be gained using the dual-thread Weld compared to a singlethreaded VLIW/EPIC processor.
منابع مشابه
Weld for Itanium Processor
Sharma, Saurabh Weld for Itanium Processor (Under the direction of Dr. Thomas M. Conte) This dissertation extends a WELD for Itanium processors. Emre Özer presented WELD architecture in his Ph.D. thesis. WELD integrates multithreading support into an Itanium processor to hide run-time latency effects that cannot be determined by the compiler. Also, it proposes a hardware technique called operat...
متن کاملWeld: A Multithreading Technique Towards Latency-Tolerant VLIW Processors
This paper presents a new architecture model, named Weld, for VLIW processors. Weld integrates multithreading support into a VLIW processor to hide run-time latency effects that cannot be determined by the compiler. It does this through a novel hardware technique called operation welding that merges operations from different threads to utilize the hardware resources more efficiently. Hardware c...
متن کاملLatency Tolerance: A Metric for Performance Analysis of Multithreaded Architectures
Multithreaded multiprocessor systems (MMS) have been proposed to tolerate long latencies for communication. This paper provides an analytical framework based on closed queueing networks to quantify and analyze the latency tolerance of multithreaded systems. We introduce a new metric, called the tolerance index, which quantifies the closeness of performance of the system to that of an ideal syst...
متن کاملPerformance Modeling of Multithreaded Distributed Memory Architectures
In multithreaded distributed memory architectures, long{ latency memory operations and synchronization delays are tolerated by suspending the current thread and switching to another thread, which is executed concurrently with the long{latency operation of the suspended thread. Timed Petri nets are used to model several multithreaded architectures at the instruction and thread levels. Model eval...
متن کاملThread Pitch Variant in Orthodontic Mini-screws: A 3-D Finite Element Analysis
Orthodontic miniscrews are widely used as temporary anchorage devices to facilitate orthodontic movements. Miniscrew loosening is a common problem, which usually occurs during the first two weeks of treatment. Macrodesign can affect the stability of a miniscrew by changing its diameter, length, thread pitch, thread shape, tapering angle and so on. In this study, a 3-D finite element analysis wa...
متن کامل